Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Soft-Float] - Initial Interpreter Implementation of Ps2's floating point unit specification #12001

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

GitHubProUser67
Copy link

@GitHubProUser67 GitHubProUser67 commented Nov 12, 2024

This Pull Request implements the first take ever on real Soft-Float support in PCSX2.

This work is a combination or several efforts and researches done prior.

Credits:

This pull request should be tested with every games requiring a clamping/rounding mode/float patches (cf: GameDatabase).

Currently, this PR fixes on the interpreters:

This is important to note, that this implementation, while technically fixing Gran Turismo 4 and Klonoa 2, makes the games crash due to very high floats being passed in the emu code, and failing at some points later in the process. This has not yet been ironed-out.

Other than that, this sets the floor for Soft-Float in PCSX2, a long awaited contribution.

@GitHubProUser67 GitHubProUser67 changed the title [Soft-Float] - Initial Intepreter Implementation of Ps2's floating point uint specification [Soft-Float] - Initial Intepreter Implementation of Ps2's floating point unit specification Nov 12, 2024
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for submitting a contribution to PCSX2

As this is your first pull request, please be aware of the contributing guidelines.

Additionally, as per recent changes in GitHub Actions, your pull request will need to be approved by a maintainer before GitHub Actions can run against it. You can find more information about this change here.

Please be patient until this happens. In the meantime if you'd like to confirm the builds are passing, you have the option of opening a PR on your own fork, just make sure your fork's master branch is up to date!

@GitHubProUser67 GitHubProUser67 changed the title [Soft-Float] - Initial Intepreter Implementation of Ps2's floating point unit specification [Soft-Float] - Initial Interpreter Implementation of Ps2's floating point unit specification Nov 12, 2024
@seta-san
Copy link
Contributor

Does this work on the recompilers or just interpreter?

@refractionpcsx2
Copy link
Member

You should try reading the title.

…oint unit specification.

This Pull Request implements the first take ever on real Soft-Float support in PCSX2.

This work is a combination or several efforts and researches done prior.

Credits:

- https://www.gregorygaines.com/blog/emulating-ps2-floating-point-nums-ieee-754-diffs-part-1/

- https://github.com/GitHubProUser67/MultiServer3/blob/main/BackendServices/CastleLibrary/EmotionEngine.Emulator/Ps2Float.cs

- https://github.com/Goatman13/pcsx2/tree/accurate_int_add_sub

- PCSX2 Team for their help and support in this massive journey.

This pull request should be tested with every games requiring a clamping/rounding mode (cf: GameDatabase).

Currently, this PR fixes on the interpreters:

- PCSX2#354

- PCSX2#11507

- PCSX2#10519

- PCSX2#8068

- PCSX2#7642

- PCSX2#5257

This is important to note, that this implementation, while technically fixing Gran Turismo 4 and Klonoa 2, makes the games crash due to very high floats being passed in the emu code, and failing at some points later in the process. This has not yet been ironed-out.

Other than that, this sets the floor for Soft-Float in PCSX2, a long awaited contribution.
@seta-san
Copy link
Contributor

You should try reading the title.

I don’t know why my brain just skimmed over that.

@MrCK1
Copy link
Member

MrCK1 commented Nov 12, 2024

I ran a bunch of tests for "Test Drive Unlimited" AI during the demo scene after sitting idle on the menu. No combination of settings/interpreters seems to have any effect on behavior. There must be something else going on

@AmyRoxwell
Copy link

AmyRoxwell commented Nov 13, 2024

I can confirm with the multiplication setting that MHG and Dos not longer have bugged bounces! (Plus the build works on linux well! [as well as running interpreter can do, of course :'D])
Screenshot_20241112_234106
Monster Hunter G_SLPM-65869_20241112234029
Monster Hunter 2_SLPM-66280_20241112235154

@Shoegzer
Copy link

Shouldn't this help with #2990 as well?

@AmyRoxwell
Copy link

Shouldn't this help with #2990 as well?

I remember seeing on the public dev channel that stuntman not longer had AI issues with this and not longer the car AI failed? it's been a while

@Blackbird88
Copy link
Contributor

Blackbird88 commented Nov 13, 2024

Tourist Trophy NTSC works now too! For the first time License Tests in it work!
Requires: EE MUL/DIV

Some of the tests still don't work however.

Works
image
image

Hangs
image

@Tokman5
Copy link
Contributor

Tokman5 commented Nov 13, 2024

I tested the demo replays of Tokyo Xtreme Racer Zero (see issue #5597 ) and noticed that the car movement in interpreter mode is now closer to the movement in recompiler mode.
However, there are still slight differences between the two, and both are far from matching the console playback.

comp.mp4

@Goatman13
Copy link
Contributor

Goatman13 commented Nov 13, 2024

Nice work!

I tested only 2 games for now. Hype Time Quest which if i remember correctly needs accurate fpu mul, and it works fine without patch.

Second game is THPS4. While here accurate VU1 add/sub fix issue which is normally fixed by VU1 rounding in database, accurate mul/div breaks game graphics. Of course, game don't need accurate VU1 mul/div to work correctly, but something seems to be off when enabled.

Accurate vu1 add/sub/sqrt:
image
Accurate vu1 mul/div (same place in game):
image

Edit: Maybe THPS4 is just passing some big float value to GS with accurate mul/div. That may not necessarily be an issue with the float operation itself.
Additionally, I tested Burnout 2 and accurate VU1 add/sub fixed issue that was previously fixed by VU1 rounding mode (white car parts).

Edit2: Freaky Flyers is fixed with accurate mul/div on VU1. Awesome! We weren't even sure if that's floating points issue until now.

The game sends some super low floats to the Mul unit.

On PS2, floats with exponent zero should return zero, but this is not the case in Mul, the multiplier can work with denormals internally.

I love when undocumented stuff is used by some games for their 3D engine ^^.
@GitHubProUser67
Copy link
Author

GitHubProUser67 commented Nov 13, 2024

The Tony Hawk case is fixed, the game uses an un-documented behaviour in it's 3D engine.

The PS2 has no denormals support .... except in the Mul unit apparently.

The behaviour is now emulated properly.

@Shoegzer
Copy link

I remember seeing on the public dev channel that stuntman not longer had AI issues with this and not longer the car AI failed? it's been a while

@AmyRoxwell Can you provide a reference to this? There's no indication of it being fixed in #2990, and I assume the devs would have closed it if it were. In any event it involves pathing in Driver 3 as well.

@weirdbeardgame
Copy link
Contributor

weirdbeardgame commented Nov 13, 2024

2024-11-10.14-28-32.mp4

This also affects the Fatal Frame 1 issue. Meaning this + the current GameDB patch will end up being the ultimate fix

More accurate approach to compare.
@Goatman13
Copy link
Contributor

#3200 is fixed when tested with b7f3806
Require accurate mul/div for FPU.
I didn't tested, but Krome studio games should profit from this pr too, ones that have patches in game db. Like Spyro, Star Wars, one of Transformers game.

@ghost
Copy link

ghost commented Nov 13, 2024

This pr's EE interpreter fixes #11636 's gamedb issue.

Implements accurate SQRT options, also removes Tri-Ace hack, which isn't needed anymore on the interpreter.
@AmyRoxwell
Copy link

I remember seeing on the public dev channel that stuntman not longer had AI issues with this and not longer the car AI failed? it's been a while

@AmyRoxwell Can you provide a reference to this? There's no indication of it being fixed in #2990, and I assume the devs would have closed it if it were. In any event it involves pathing in Driver 3 as well.

I meant like, while using this PR, not that is has been fixed. Sorry if it was misunderstood. But if it's not mention on the PR maybe the thing it needs it's not here by this initial implementation.

@GitHubProUser67
Copy link
Author

Driv3r seemed fine when it was tested, Stuntman NTSC is a lot better but still can "slightly" deviate.

I suspect it is once again, the interpreter rounding/clamping values somewhere.

@Shoegzer
Copy link

@AmyRoxwell Ah, I understand you now. That's great news.

@GitHubProUser67 Thanks, nice to see Driv3r is looking better. Would it make sense to list these games in your OP?

Copy link
Contributor

@TheLastRar TheLastRar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had skimmed though this a little while ago and this was what I picked up on.

pcsx2/Ps2Float.cpp Outdated Show resolved Hide resolved
pcsx2/VUops.cpp Outdated
Comment on lines 1051 to 1062
VECTOR* tmp;
VECTOR* dst;
if (_Fd_ == 0)
dst = &RDzero;
else
dst = &VU->VF[_Fd_];

if (_X) dst->i.x = VU_MACx_UPDATE(VU, vuDouble(VU->ACC.i.x) + ( vuDouble(VU->VF[_Fs_].i.x) * vuDouble(VU->VF[_Ft_].i.x))); else VU_MACx_CLEAR(VU);
if (_Y) dst->i.y = VU_MACy_UPDATE(VU, vuDouble(VU->ACC.i.y) + ( vuDouble(VU->VF[_Fs_].i.y) * vuDouble(VU->VF[_Ft_].i.y))); else VU_MACy_CLEAR(VU);
if (_Z) dst->i.z = VU_MACz_UPDATE(VU, vuDouble(VU->ACC.i.z) + ( vuDouble(VU->VF[_Fs_].i.z) * vuDouble(VU->VF[_Ft_].i.z))); else VU_MACz_CLEAR(VU);
if (_W) dst->i.w = VU_MACw_UPDATE(VU, vuDouble(VU->ACC.i.w) + ( vuDouble(VU->VF[_Fs_].i.w) * vuDouble(VU->VF[_Ft_].i.w))); else VU_MACw_CLEAR(VU);
tmp = &VU->TMP;
if (_X) {tmp->i.x = vuAccurateMulDiv(VU, VU->VF[_Fs_].i.x, VU->VF[_Ft_].i.x, 0); dst->i.x = VU_MACx_UPDATE(VU, vuAccurateAddSub(VU, VU->ACC.i.x, tmp->i.x, 0));} else VU_MACx_CLEAR(VU);
if (_Y) {tmp->i.y = vuAccurateMulDiv(VU, VU->VF[_Fs_].i.y, VU->VF[_Ft_].i.y, 0); dst->i.y = VU_MACy_UPDATE(VU, vuAccurateAddSub(VU, VU->ACC.i.y, tmp->i.y, 0));} else VU_MACy_CLEAR(VU);
if (_Z) {tmp->i.z = vuAccurateMulDiv(VU, VU->VF[_Fs_].i.z, VU->VF[_Ft_].i.z, 0); dst->i.z = VU_MACz_UPDATE(VU, vuAccurateAddSub(VU, VU->ACC.i.z, tmp->i.z, 0));} else VU_MACz_CLEAR(VU);
if (_W) {tmp->i.w = vuAccurateMulDiv(VU, VU->VF[_Fs_].i.w, VU->VF[_Ft_].i.w, 0); dst->i.w = VU_MACw_UPDATE(VU, vuAccurateAddSub(VU, VU->ACC.i.w, tmp->i.w, 0));} else VU_MACw_CLEAR(VU);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for using a global TMP vector?
This can just be using a locally defined one, right?

ditto for the other fused ops

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason behind the global vector is to allow for more generalised debugging when doing an FMA operation.

The use cases where it can be used, is when you don't know where you float went wrong, and it happened in an FMA instruction, so you could simply hook the global TMP and see all the values the FMA processed in between.

Ultimately you decide if it is worth or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not have it personally
The less global a variable, the less trouble it causes

pcsx2/VUflags.h Outdated Show resolved Hide resolved
@GitHubProUser67 GitHubProUser67 force-pushed the soft-float-int branch 2 times, most recently from 36c0d7d to 5c5febd Compare November 21, 2024 22:19
More in line with the PCSX2 code-base.
Copy link
Member

@TellowKrinkle TellowKrinkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For more general design, I think PS2Float should hold the float directly, with methods to extract the pieces, rather than storing the parts separately. This would save deconstructing and then reconstructing floats to do operations like abs and negate, which really just need to and or xor with some bits.

e.g.

class PS2Float
{
public:
	u32 raw;
	constexpr explicit PS2Float(u32 raw_): raw(raw_) {}
	constexpr u32 Mantissa() const { return (raw & 0x7fffff) | 0x800000; }
	constexpr u8 Exponent() const { return (raw >> 23) & 0xff; }
	constexpr bool Sign() const { return raw >> 31; }
	constexpr PS2Float Abs() const { return { raw & 0x7fffffff }; }
	constexpr PS2Float Negate() const { return { raw ^ 0x80000000 }; }
};

(If we really want to let the compiler optimize well, we should implement add/sub/etc as static functions that take both lhs and rhs by value, then we can mark them with __attribute__((const)) and let the compiler assume they won't touch any global variables or anything, which might be important if we're hoping the compiler will hoist all those CHECK_VU_SOFT_ADDSUB checks outside of the four if statements in the VUs, though I'm still kind of doubtful that it will)

pcsx2-qt/Settings/AdvancedSettingsWidget.ui Outdated Show resolved Hide resolved
pcsx2/FPU.cpp Outdated Show resolved Hide resolved
pcsx2/FPU.cpp Outdated Show resolved Hide resolved
Comment on lines 807 to 831
s32 PS2Float::clz(s32 x)
{
x |= x >> 1;
x |= x >> 2;
x |= x >> 4;
x |= x >> 8;
x |= x >> 16;

return debruijn32[(u32)x * 0x8c0b2891u >> 26];
}

s32 PS2Float::BitScanReverse8(s32 b)
{
return msb[b];
}

s32 PS2Float::GetMostSignificantBitPosition(u32 value)
{
for (s32 i = 31; i >= 0; i--)
{
if (((value >> i) & 1) != 0)
return i;
}
return -1;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have helpers for these that map to the associated cpu instructions

IIRC we're just using MS names, so int _BitScanReverse(unsigned long* const Index, const unsigned long Mask)
If you don't like the weird signature (e.g. using long instead of u32/u64), maybe update common/BitUtils.h with a nicer one instead of defining one here.

pcsx2/Ps2Float.h Outdated Show resolved Hide resolved
pcsx2/Ps2Float.h Outdated Show resolved Hide resolved
pcsx2/VUops.cpp Outdated Show resolved Hide resolved
pcsx2/VUops.cpp Outdated Show resolved Hide resolved
This broke stuff on very high floats, 0x40 is an un-biased value that can't be changed from IEEE standard.

We now 100% match the PS3's SPEs, but not the PS2 Div result (can be off by one bit). However, this is still way better than IEEE754.
@Goatman13
Copy link
Contributor

#5070 fixed with accurate VU0 Mul/Div for COP2 (require EE interpreter like Dirge of Cerberus).

@bigol83
Copy link

bigol83 commented Nov 22, 2024

#5070 fixed with accurate VU0 Mul/Div for COP2 (require EE interpreter like Dirge of Cerberus).

Oh, i suspected it was something like this, nice.

Here is a video

2024-11-22.22-36-21.mp4

@Goatman13
Copy link
Contributor

#4546 fixed with accurate VU1 Mul/Div. :)

@GitHubProUser67
Copy link
Author

There are 3 code styles remaining, The removal of the Global VU tpm, PCSX2's own helpers in PS2Float and improvement of the PS2Float class to store the raw float.

Applies some recommendations from the review.

The next batch will come later.
@Goatman13
Copy link
Contributor

At some point you changed back mantissa normalization constant for Div. From 0x39 to 0x40 which is used in IEEE754.
I'm just curious if previous value was incorrect and you wanted to use 0x3F initially? I didn't test anything, just noticed that 0x39 is little bit weird here, like confused hex with decimal accidentally.
This commit: b0b65fa

@GitHubProUser67
Copy link
Author

GitHubProUser67 commented Nov 23, 2024

I see you are well documented on the subject :) I will explain to you why this "experiment" failed.

The current Div/Sqrt system mimic the PS3 SPEs (I have more experience with that system), and they uses a IEEE "like" divisor that allowed me to test what is closer to PS2 hardware. Their rounding normalization is exactly using IEEE value from the tests I did.

The PS2 Div is a mess, they round to nearest by default, but their "rounding" is completely custom. So I took the PS3 hardware as a basis and tried to "lower" the rounding delta to see what is closer to PS2 hardware, 0x39 seemed fine, but in actual fact, that was a hack.

Changing this value introduced a slight BIAS in the result, which can affect high floats values and makes them incorrect. 0X3F was still "too high" so to speak.

Current Div/Sqrt system is exactly like a PS3 emulating the PS2 in therms of results, which means way closer (and fixes the VAST majority of games), but still one bit off in some edge cases.

Add/Sub/Mul are PS2 perfect.

@seta-san
Copy link
Contributor

I see you are well documented on the subject :) I will explain to you why this "experiment" failed.

The current Div/Sqrt system mimic the PS3 SPEs (I have more experience with that system), and they uses a IEEE "like" divisor that allowed me to test what is closer to PS2 hardware. Their rounding normalization is exactly using IEEE value from the tests I did.

The PS2 Div is a mess, they round to nearest by default, but their "rounding" is completely custom. So I took the PS3 hardware as a basis and tried to "lower" the rounding delta to see what is closer to PS2 hardware, 0x39 seemed fine, but in actual fact, that was a hack.

Changing this value introduced a slight BIAS in the result, which can affect high floats values and makes them incorrect. 0X3F was still "too high" so to speak.

Current Div/Sqrt system is exactly like a PS3 emulating the PS2 in therms of results, which means way closer (and fixes the VAST majority of games), but still one bit off in some edge cases.

Add/Sub/Mul are PS2 perfect.

Can’t argue with Sony’s own solution. Sounds like a great comment to document the solution.

@GitHubProUser67
Copy link
Author

GitHubProUser67 commented Nov 23, 2024

I would say the main issue came from the fact it was widely assumed the PS2 and PS3 shared the same "float" calculation behaviours, while in actual fact, they do not.

The IBM floats, even if not compliant to IEEE are still very similar in action.

The PS2 deviates slightly on some defined calculations. The Multiplication is flawed for instance, due to them using a booth and wallace approach (Method which cost a LOT LESS to manufacture) that was dropping bits in some cases, causing plenty of calculation deviation ( something * 1 NOT ALWAYS equal to something).

@Goatman13
Copy link
Contributor

I would say the main issue came from the fact it was widely assumed the PS2 and PS3 shared the same "float" calculation behaviours, while in actual fact, they do not.

That was the plan as far as I know. Probably someone underestimated how important 1:1 implementation was to keep compatibility. PS3 CELL implemented special Altivec/VMX mode to make PPE core vector floats compatible with SPU, but also with PS2 emulation in mind. Later it turned out that they really miscalculated importance of accurate floating point math in PS2 emulation. To the point that first fixes in ~2008-2010 were strictly per game, like "on this pc, add 0x01 to this reg, and get back execution to recompiler". Soft floats, including accurate DIV, came later, when new people were hired. :) In the end only VU1 is using real SPE, everything else use PPE VMX compatibility mode.

Current Div/Sqrt system is exactly like a PS3 emulating the PS2 in therms of results, which means way closer (and fixes the VAST majority of games), but still one bit off in some edge cases.

Add/Sub/Mul are PS2 perfect.

And that's indeed a great outcome. Nice work. :)

@GitHubProUser67
Copy link
Author

Thank you Goatman13! The PCSX2 team also gets massive credits here for their help and support, of course the goal is to be PS2 perfect on this one. There is this one game, Mortal Combat Shaolin Monks which doesn't work fully yet on the Soft-Float div due to the game doing loads of Log2 on top of existing Log2 results, leading to it relying on rounding in booth directions...

…eed-up simple calculations.

This greatly improves performance while using Soft-Floats.
@TellowKrinkle
Copy link
Member

BTW the reason macOS builds are failing is they have PCH off, which means you don't get a bunch of headers auto-included. In particular, you need an #include "common/Pcsx2Defs.h" at the top of PS2float.c to get the u32, etc defines.

…s to their respectives place.

I don't like the SIMD way of doing it, it can be slower and less practical to use (expensive casting).
@GitHubProUser67
Copy link
Author

Now the only code style remaining is the Global VU TMP.

Comment on lines +125 to +130
for (s32 i = 31; i >= 0; i--)
{
if (((value >> i) & 1) != 0)
return i;
}
return -1;
Copy link
Member

@TellowKrinkle TellowKrinkle Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (s32 i = 31; i >= 0; i--)
{
if (((value >> i) & 1) != 0)
return i;
}
return -1;
#ifdef _MSC_VER
unsigned long bit;
return _BitScanReverse(&bit, value) ? bit : -1;
#else
return value ? __builtin_clz(value) ^ 31 : -1;
#endif

And you can drop the rest of the changes and make BitScanReverse8 and clz just call this (or pick one and stop using the rest).
As a side note, not handling the value == 0 case will be a bit cheaper, and in my experience in pretty much all cases, the caller has already checked and is only calling this if they know value is nonzero, so the separate check here is just wasted cycles.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will do that when possible and push, I admit I lacked experience with the BitScan part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.